AdSpark: A Multimodal Generative AI Framework for Automated Digital Advertising and Marketing Asset Synthesis

Authors: Saurabh Keni

DOI Link: https://doi.org/10.22214/ijraset.2026.78444

Abstract

The rapid evolution of generative artificial intelligence has fundamentally transformed the landscape of digital marketing and content creation. This paper presents the design and implementation of AdSpark, a multimodal generative platform designed to automate the production of professional advertising creatives and comprehensive marketing strategies. By integrating cutting-edge large language models (LLMs) and vision-language models, specifically Google Gemini, AdSpark enables users to generate high-fidelity advertising posters and synchronized marketing copy from minimal product inputs. The system leverages a robust full-stack architecture comprising Next.js for the frontend, Express.js for the backend, and Prisma with PostgreSQL for persistent state management. Key features include automated image synthesis via multimodal prompts, AI-driven tagline generation, social media captioning, and target audience profiling. Experimental evaluation demonstrates that AdSpark reduces creative production time by over 90% compared to traditional manual workflows. This research provides a scalable architectural blueprint for democratizing high-end advertising production through specialized AI orchestration.

Introduction

In today’s digital economy, creating professional marketing content is costly and time-consuming, especially for SMEs. AdSpark leverages Generative AI and Multimodal Large Language Models (MLLMs) to automate the end-to-end marketing creative process, including image enhancement, copywriting, and campaign strategy. The platform integrates vision and language models (Gemini) through a parallel multimodal workflow, combined with a React-Node-PostgreSQL architecture, allowing real-time generation, storage, and retrieval of campaign assets.

Testing showed that AdSpark produces high-quality, brand-consistent marketing materials in seconds, achieving over 99% faster creation and cost reduction compared to manual workflows, while maintaining professional-level visual and textual output. This demonstrates the potential of AI orchestration platforms to revolutionize content creation and marketing automation.

Conclusion

AdSpark successfully demonstrates the viability of utilizing multimodal generative AI to automate the complex process of marketing asset production. By consolidating vision and language tasks into a single, credit-managed workflow, the platform lowers the technical and financial barriers to professional-level advertising. Future iterations will focus on Dynamic Video Ad Generation and A/B Testing Integration, allowing users to optimize their generated content based on real-world performance metrics.

References

[1] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, \"Attention is all you need,\" in Advances in Neural Information Processing Systems (NeurIPS), vol. 30, pp. 5998–6008, 2017. [2] W. Xiong et al., \"Achieving human parity in conversational speech recognition,\" IEEE/ACM Transactions on Audio, Speech, and Language Processing, vol. 25, no. 12, pp. 2410–2423, Dec. 2017. [3] J. Gao, M. Galley, and L. Li, \"Neural approaches to conversational AI,\" Foundations and Trends in Information Retrieval, vol. 13, no. 2–3, pp. 127–298, 2019. [4] A. Ramesh, M. Pavlov, G. Goh, S. Gray, C. Voss, A. Radford, M. Chen, and I. Sutskever, \"Zero-shot text-to-image generation,\" in Proc. 38th Int. Conf. on Machine Learning (ICML), 2021. [5] Prisma Data Team, \"Type-safe database access with Prisma for modern Node.js applications,\" Technical Documentation, 2024. [Online]. Available: https://www.prisma.io/docs [6] Gemini Team, Google, \"Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context,\" Technical Report, 2024. [Online]. Available: https://ai.google.dev/gemini-api/docs [7] Y. Chen and L. Li, \"Conversational AI for education: A systematic review,\" IEEE Transactions on Learning Technologies, vol. 13, no. 4, pp. 645–658, Oct.–Dec. 2020. [8] D. Amodei et al., \"Deep Speech 2: End-to-end speech recognition in English and Mandarin,\" in Proc. 33rd Int. Conf. on Machine Learning (ICML), 2016, pp. 173–182. [9] R. Dale, \"The return of the chatbots,\" Natural Language Engineering, vol. 22, no. 5, pp. 811–817, 2016. [10] S. Young et al., \"The POMDP-based statistical spoken dialogue system: A review,\" Proceedings of the IEEE, vol. 101, no. 5, pp. 1160–1179, May 2013.

Copyright

Copyright © 2026 Saurabh Keni . This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET78444

Publish Date : 2026-03-17

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here